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Abstract 

We  develop  a  nonparametric  belief  propagation  method  for  uncertainty  quan¬ 
tification  and  apply  it  to  model  flow  in  random  porous  media.  The  rela¬ 
tionship  between  the  high-dimensional  input  and  the  multi-output  responses 
is  addressed  by  breaking  the  global  regression  problem  into  smaller  local 
problems  using  probabilistic  links.  These  links  can  be  well  represented  in  a 
probabilistic  graphical  model.  The  whole  framework  is  designed  to  be  non¬ 
parametric  (Gaussian  mixture)  in  order  to  capture  the  non-Gaussian  features 
of  the  response.  With  the  known  input  distribution,  a  loopy  nonparamet¬ 
ric  belief  propagation  algorithm  is  used  to  find  the  corresponding  marginal 
distributions  of  the  responses.  The  probabilistic  graphical  framework  can  be 
used  as  a  surrogate  model  to  predict  the  responses  for  new  input  realiza¬ 
tions  as  well  as  our  confidence  on  these  predictions.  Numerical  examples  are 
presented  to  show  the  accuracy  and  efficiency  of  the  probabilistic  graphical 
model  framework  and  nonparametric  belief  propagation  method  for  solving 
uncertainty  quantification  problems  in  flows  in  porous  media  using  stationary 
and  non-stationary  permeability  fields. 
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1.  Introduction 

Uncertainty  Quantification  (UQ)  of  multi-scale  and  multi-physics  sys¬ 
tems  is  a  field  of  great  interest  and  has  attracted  the  attention  of  many 
researchers  and  communities  in  recent  years.  However,  it  is  difficult  to  con¬ 
struct  a  complete  probabilistic  model  of  such  problems  mainly  because  they 
often  involve  high- dimensional  and  continuous  random  variables,  which  have 
complex,  multi-modal  distributions. 

Over  the  past  few  decades,  many  methods  and  algorithms  have  been  de¬ 
veloped  to  address  UQ  problems.  The  most  widely  used  method  is  the  Monte 
Carlo  (MC)  method.  MC’s  wide  acceptance  is  due  to  the  fact  that  it  can 
uncover  the  complete  statistics  of  the  solution,  while  having  a  convergence 
rate  that  is  (remarkably)  independent  of  the  input  dimension.  Nevertheless, 
it  quickly  becomes  inefficient  in  high  dimensional  and  computationally  inten¬ 
sive  problems,  where  only  a  few  samples  can  be  observed.  Other  methods  are 
attempting  to  construct  a  surrogate  model  for  the  complex  physical  system. 
The  idea  is  to  run  the  deterministic  physical  solver  on  a  small,  well-selected 
set  of  inputs  and  then  use  these  data  to  learn  the  response  surface,  so  that 
the  UQ  problem  can  be  studied  based  on  the  surrogate  instead  of  the  com¬ 
putationally  expensive  simulator.  Such  kind  of  methods  include,  the  adap¬ 
tive  sparse  grid  collocation  method  (AGSC)  [1],  the  multi- response  Gaussian 
process  method  (MGP)  [2],  adaptive  locally  weighted  projection  regression 
methods  (ALWPR)  [3],  and  so  on.  However,  all  these  methods  have  to  face 
the  “curse  of  dimensionality”  problem,  when  the  inputs  are  high- dimensional. 

Recently  developed  probabilistic  graphical  models  [4]  can  provide  a  pow¬ 
erful  framework  to  effectively  interpret  complex  probabilistic  problems  in¬ 
volving  many  inter-correlated  variables.  The  two  basic  elements  of  a  graphi¬ 
cal  model  are  its  nodes  and  edges.  The  nodes  represent  the  random  variables 
and  an  edge  linking  two  nodes  represents  the  correlation  between  them.  The 
joint  probability  distribution  can  be  accessed  by  decomposing  the  complex 
network  into  local  clusters  defined  by  connected  subsets  of  nodes.  Then, 
by  applying  appropriate  inference  algorithms,  the  marginal  and  conditional 
probabilities  of  interest  can  be  effectively  calculated. 
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The  probabilistic  graphical  model  has  been  used  in  a  range  of  applica¬ 
tion  domains,  which  include  web  search  [5],  medical  and  fault  diagnosis  [6], 
speech  recognition  [7],  robot  navigation  [8],  bioinformatics  [9],  communica¬ 
tions  [10],  natural  language  processing  [11],  computer  vision  [12],  and  many 
more.  Most  of  the  above  applications  have  demonstrated  the  excellent  per¬ 
formance  of  graphical  model  in  the  discrete  world  and  the  low- dimensional 
continuous  world.  However,  for  problems  involving  high-dimensional  contin¬ 
uous  variables,  the  number  of  efficient  and  accurate  algorithms  is  limited. 

The  simplest  way  to  investigate  continuous  graphical  models  is  discretiza¬ 
tion.  However,  for  problems  involving  high-dimensional  variables,  exhaustive 
discretization  of  the  random  space  is  computationally  infeasible.  Gaussian 
approximation  is  a  widely  used  technique  to  build  the  continuous  model, 
however,  the  performance  of  Gaussian  model  is  not  very  satisfactory,  espe¬ 
cially  for  problems  involving  non-Gaussian  features.  Therefore,  in  this  work, 
we  propose  to  build  a  nonparametric  (non-Gaussian)  graphical  model. 

The  general  procedure  of  studying  a  graphical  model  problem  can  be  sum¬ 
marized  as  follows:  (1)  Prepare  the  training  data  for  the  graphical  model;  (2) 
Design  the  structure  of  the  graphical  model;  (3)  Learn  the  graphical  model 
with  the  training  data,  including  designing  the  potential  functions  and  learn¬ 
ing  the  unknown  parameters;  (4)  Solve  an  inference  problem,  that  is,  find 
the  conditional  or  marginal  probabilities  of  interest.  In  this  work,  we  con¬ 
sider  a  graphical  model  that  interprets  the  probabilistic  relationship  between 
the  inputs  of  a  physical  system  with  the  corresponding  responses.  Generally 
speaking,  for  a  complex  physical  problem,  the  input  variables  are  the  key  fac¬ 
tors  which  determine  the  characteristics  or  the  performance  of  the  physical 
system  (response  variables).  For  example,  in  flow  in  porous  media,  the  input 
variable  is  the  discretized  representation  of  the  permeability  field,  while  the 
response  variables  are  the  physical  properties  such  as  velocity  and  pressure, 
which  are  strongly  influenced  by  the  inputs.  Model  reduction  techniques 
often  have  to  be  applied  first  to  the  input  due  to  its  high  dimensionality. 
All  the  unknown  parameters  in  the  graphical  model  can  be  learned  locally 
via  techniques  such  as  maximum  likelihood  (MLE),  or  maximum  a  posteriori 
probability  (MAP).  To  address  the  inference  problem  several  sampling  and 
variational  algorithms  can  be  applied.  Since  the  designed  graphical  model 
in  this  work  is  nonparametric,  a  sampling  based  nonparametric  belief  prop¬ 
agation  [13,  14]  algorithm  is  selected  to  carry  out  the  inference  task.  In  the 
following  sections,  we  will  discuss  how  we  build  the  graphical  model  and  how 
to  study  the  problems  of  interest  in  detail.  In  particular,  in  this  work,  the 
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problem  under  investigation  is  flow  through  porous  (heterogeneous)  media. 
We  are  interested  in  investigating  how  the  uncertainty  propagates  from  the 
input  permeability  field  to  the  corresponding  response  properties  field.  In 
addition,  we  want  to  build  a  surrogate  model  to  the  deterministic  solver  that 
will  allows  us  for  any  realization  of  the  input  permeability  to  predictive  the 
physical  responses  as  well  as  our  confidence  on  these  predictions  (induced  by 
the  limited  data  used  to  train  the  graphical  model). 

In  [15],  the  authors  proposed  a  probabilistic  graphical  model  for  multiscale 
stochastic  partial  differential  equations  (SPDEs)  that  focuses  on  the  corre¬ 
lation  between  physical  responses.  The  distribution  of  physical  responses 
conditioned  on  stochastic  input  was  approximated  using  conditional  random 
field  theories.  Different  physical  responses  (such  as  flux  and  pressure  in  flows 
in  heterogeneous  media)  are  correlated  in  such  a  way  that  their  interactions 
are  assumed  to  be  conditioned  on  fine-scale  local  properties.  No  model  re¬ 
duction  of  fine-scale  properties  was  involved  in  this  process.  The  influence  of 
fine-scale  properties  on  coarse-scale  responses  was  modeled  through  a  set  of 
hidden  variables.  The  approach  in  this  paper  is  significantly  different  in  mul¬ 
tiple  fronts:  (1)  an  undirected  graph  model  is  introduced  rather  than  a  factor 
graph  as  in  [15];  (2)  the  graphical  model  considers  output  responses  that  are 
independent  of  each  other;  (3)  an  explicit  model  reduction  scheme  is  consid¬ 
ered  to  reduce  the  dimensionality  of  the  random  permeability  field  without 
the  need  for  introducing  hidden  variables;  and  (4)  the  graph  structure  and 
graph  learning  scheme  are  implemented  in  a  completely  different  algorithmic 
approach  based  on  the  Expectaction/Maximization  (EM)  algorithm  and  a 
sampling  based  approach  to  nonparametric  belief  propagation. 

This  paper  is  organized  as  follows.  First,  the  problem  definition  is  given 
in  Section  2.  Then  the  basic  procedure  of  how  to  construct  an  appropri¬ 
ate  graphical  model  and  all  the  associated  algorithms  are  discussed  in  Sec¬ 
tions  3,  4  and  5.  In  Section  6,  we  introduce  the  porous  media  flow  problem 
and  provide  various  examples  demonstrating  the  efficiency  and  accuracy  of 
the  graphical  model  approach.  Brief  discussion  and  conclusions  are  finally 
provided  in  Section  7. 

2.  Problem  definition 

Let  (D,  T ,  V)  be  a  probability  space,  where  fl  is  the  sample  space  corre¬ 
sponding  to  the  outcomes  of  some  experiments,  T  a  cr-algebra  of  subsets  in 
fl  (these  subsets  are  called  events)  and  V  :  T  — >  [0, 1]  the  probability  mea- 
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sure.  In  this  framework,  let  us  define  D  C  Md,  d  =  1,2,3  the  spatial  domain 
of  interest  (physical  space),  x  6  D  a  spatial  point,  where  x  =  (aq, . . .  ,Xd)- 
Consider  a  random  field  { Ax}xex?-  We  assume  Ax  is  a  function  that  maps 
the  probability  space  fl  to  M,  i.e., 

Ax  :  Q  -»•  M,  (1) 

which  assigns  to  each  element  uj  G  hi  a  real-value  (considering  isotropic  per¬ 
meability)  Ax  at  a  specific  point  x.  We  define  a  =  A  (a;),  oj  G  a  realization 
of  A. 

In  practice,  the  physical  domain  is  decomposed  into  fine-elements  where 
the  permeability  is  defined.  For  example,  consider  a  partition,  7}  for  the 
domain  D  into  non-overlapping  elements  e*,  i.e.,  7}  =  (J^fj  e%1  where  Nf 
is  the  number  of  fine-elements  in  the  domain.  The  random  input  field  is 
specified  on  the  fine-grid  and  A(u)  can  be  written  as  follows: 

A(cn)  =  (Ai(u;),  A2(cu), . . . ,  AiV/(c a)).  (2) 

For  most  cases,  we  are  only  interested  in  the  responses  on  a  coarser  grid 
than  Tf.  Therefore,  let  us  define  a  coarser  partition  of  the  same  domain  D. 
Denote  this  partition  as  Tc  =  IJ;:=i  where  Nc  is  the  number  of  coarse- 
elements.  Fig.  1  shows  a  fine-grid  (finer  lines)  and  a  corresponding  coarse- 
grid  (heavier  lines).  Let  Nq  denotes  the  number  of  nodes  on  the  coarse-grid. 
Then  we  can  represent  the  response  field  Y(ca)  =  (Yx(u;)  :  x  e  D}  as  a 
discrete  set  of  responses  on  the  coarse-nodes  as  follows: 

Y(W)  =  (YxH,  Y2(c o),  . . . ,  Yjvg(w)).  (3) 

The  values  of  Yx(ca)  on  other  spatial  points  can  be  interpolated. 

In  uncertainty  quantification  tasks,  one  specifies  a  probability  density  on 
the  input  A,  p(A),  and  is  interested  to  quantify  the  probability  measure 
induced  by  it  on  the  response  field.  This  can  be  achieved  by  marginalizing 
Y  from  the  joint  distribution  of  p( Y,  A)  as: 

P(  Y)  =  J  p(Y,A)dA 

=  j  p(Y\A)p(A)dA.  (4) 

If  we  assume  a  constant  permeability  at  each  fine-scale  element,  then 
A  is  described  by  a  high-dimensional  ( Nf )  set  of  highly  correlated  random 
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Figure  1:  Schematic  of  the  domain  partition:  (a)  fine-  and  coarse-scale  grids  and  (b) 
fine-scale  local  region  in  one  coarse-element. 

variables  with  a  distribution  that  is  hard  to  evaluate.  A  common  way  to  deal 
with  such  problem  is  creating  a  reduced-dimensionality  input  model.  For 
example,  one  can  perform  a  Karhunen-Loeve  expansion  [16]  for  a  given  set 
of  permeability  realizations,  and  then  truncate  this  expansion  after  k ^  terms, 
and  use  k ^  ( k ^  <<  dim( A))  random  variables  £  to  represent  the  permeability 
distribution.  Therefore,  the  above  uncertainty  propagation  problem  can  be 
re-formulated  as: 


p( Y)  =  J  p( Y,  A)dA 
«  Jp(Y,S)d£ 

=  J  p(.Y\d)p(Z)dt  (5) 

In  many  problems,  £  is  still  high-dimensional  which  prevents  us  from 
efficiently  finding  the  conditional  distribution  p(Y|£).  Hence,  we  propose  a 
way  to  localize  the  connection  between  the  inputs  and  responses,  but  also 
link  each  local  connection  in  a  global  sense.  The  localized  connection  will  be 
introduced  by  assuming  that  the  response  at  a  give  coarse-node  is  correlated 
mostly  with  the  random  permeability  at  the  neighboring  coarse-elements. 
Thus  for  each  coarse-node  i  (and  thus  response  i/i),  we  can  affiliate  a  reduced 
set  of  random  variables  s,  that  define  the  permeability  random  held  at  the 
coarse-elements  adjacent  to  node  i.  This  is  discussed  next. 
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3.  Model  reduction 

Let  us  assume  that  a  set  of  realizations  of  the  random  input  field  A  are 
given.  Using  the  Karhunen-Loeve  expansion  [16]  with  the  given  permeability 
realizations  defined  over  the  whole  spatial  domain  D,  we  can  compute  the 
global  reduced  representation  of  the  permeability  in  terms  of  the  random 
variables  We  can  symbolically  write  the  model  reduction  process  as  follows: 

£  =  7^(a),  (6) 

where  a  is  a  realization  of  the  random  field  defined  on  the  fine-grid,  and 
7 Zy  is  the  global  reduction  map.  As  part  of  this  map,  for  each  realization 
of  the  permeability  field,  we  assume  that  there  is  a  unique  realization  of  the 
random  variables  However,  the  difficulty  arises  because  the  dimensionality 
of  £,  &£,  is  high,  i.e. ,  >>  1.  Considering  that  each  response  on  the 

coarse-grid  depends  on  the  permeability  field  over  the  whole  domain  (via  the 
solution  of  the  underlying  porous  media  flow  boundary  value  problem),  it 
is  rather  difficult  to  build  a  probabilistic  link  between  the  response  Y  and 
the  reduced  input  £.  Hence,  we  make  a  physically  reasonable  assumption  for 
many  problems  that  the  response  at  one  coarse-grid  node  correlates  strongly 
on  the  input  permeability  in  the  underlying  nearest  coarse-elements,  and  that 
the  influence  by  the  permeability  at  all  other  coarse-elements  can  be  ignored, 
as  shown  in  Fig.  2. 

Let  Tj  C  {1, . . . ,  fVc}  be  the  set  of  coarse-elements  corresponding  to  the 
coarse-node  i.  Let  ap.  denote  the  input  permeability  held  over  the  corre¬ 
sponding  coarse-elements  to  coarse-node  i  (see  Fig.  2).  Based  on  the  given 
realizations  of  the  permeability  ap . ,  one  can  perform  a  Karhunen-Loeve  ex¬ 
pansion  and  obtain  the  reduced  representation  s*  of  the  permeability  held 
over  Tj  that  encodes  most  of  the  information  relevant  to  the  response  at 
coarse-node  i  as: 

s*  =  ftp^ap.),  (7) 

where  77pi  is  the  reduction  map  for  the  permeability  in  the  elements  T*.  s,; 
should  be  correlated  to  its  neighboring  reduced  representation  s j  due  to  the 
overlapped  inputs  considered,  as  shown  in  Fig.  2. 

The  connection  between  this  local  model  reduction  and  the  global  model 
reduction  in  Eq.  (6)  is  given  as  follows:  Let  Cg  be  the  global  reconstruction 
map  such  that: 

n9(ca{i))  =  i,  (8) 
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Figure  2:  An  illustration  of  the  model  reduction  framework  considered  in  this  paper. 
The  response  at  each  coarse-node  depends  on  the  permeability  field  at  the  neighboring 
coarse-elements. 


and  let  Cr;  be  the  local  reconstruction  map  corresponding  to  the  model  re¬ 
duction  for  the  ith  coarse-grid  node  defined  as: 

ar;  =  Cri(sj),  (9) 

where  ap,  is  the  reconstructed  local  random  field  on  the  coarse-elements  T;. 
We  can  write  the  following: 

S.  M  Kr,([C9(fl]ri),  (10) 

where  [-]p.  is  the  restriction  of  a  =  C9(£)  over  Tj.  Similarly,  we  can  write, 

£  «  1Zg{TL[Cr^  (si), . . .  ,CrjVG(s7vG)]},  (11) 

where  7i  is  an  implicit  function  that  averages  permeability  realizations  ob¬ 
tained  from  local  overlapping  reduction  models.  The  above  equation  implic¬ 
itly  defines  how  the  local  reduced  input  Sj  is  correlated  to  £  through  the 
reduction/reconstruction  maps.  The  function  7 i  approximates  the  global  re¬ 
construction  function  Cg,  thus  the  better  the  choice  of  7i  is,  the  closer  the 
local/global  approximations  we  obtain.  In  this  work,  the  function  7i  is  simply 
chosen  as: 


H(-)  k 


1 


N, 


overlap 


-^overlap 

E  [Cri(si)]Ei’ 


3= 1 


(12) 


where  NoveT iap  is  the  number  of  overlapped  reconstruction  over  coarse-element 
E{ ,  and  [•].,;  is  the  restriction  of  ar.  =  Cr:i  (s3)  over  the  Ei  element. 

Remark  1:  The  reduced  variables  s,  affiliated  with  different  locations  i  are 
different  random  variables.  For  a  stationary  permeability  random  field,  local 
features  have  the  same  distribution  on  coarse-elements,  therefore,  the  local¬ 
ized  reduced  random  variables  s*  are  going  to  follow  the  same  distribution 
for  all  i  (not  considering  boundary  effects).  However,  notice  that  one  can 
not  say  these  s*  are  the  same  random  variables  even  if  they  follow  the  same 
distribution.  Given  a  realization  of  stationary  stochastic  input  a>n\  the  local 
features  aj,"'  and  a[n^  on  coarse-elements  Ek  and  Ei  are  in  general  different. 
For  nonstationary  random  permeability  field,  the  s*  variables  at  different 
locations  i  follow  different  marginal  distributions. 

The  localization  of  the  input  model  reduction  outlined  above  can  be  im¬ 
plemented  with  literally  any  model  reduction  technique  including  linear  and 
nonlinear  dimension  reduction  algorithms.  For  linear  dimension  reduction, 
the  most  famous  and  the  most  widely  used  method  is  the  Principal  Compo¬ 
nent  Analysis  (PCA)  [17]  method.  The  first  version  of  PCA  method  appeared 
half  a  century  ago  and  it  has  been  shown  since  then  to  be  a  reliable  reduc¬ 
tion  method  forming  the  basis  of  many  other  more  advanced  mathematical 
reduction  methodologies.  In  the  last  decade,  a  large  number  of  nonlinear  di¬ 
mensional  reduction  techniques  have  been  proposed  (e.g.,  [18,  19,  20]).  Most 
of  the  nonlinear  techniques  are  not  as  well  studied  and  have  been  shown 
often  to  provide  better  performance  than  PCA  for  artificial  than  physical 
nonlinear  datasets  [21].  These  methods  perform  not  better  (sometimes  much 
poorer)  than  PCA  for  natural  datasets  [21].  Therefore,  here  for  simplicity 
of  the  presentation,  we  choose  PCA  as  the  dimension  reduction  technique. 
This  will  allow  us  to  emphasize  the  graph  theoretic  approach  for  solving  the 
underlying  stochastic  flow  problem  of  interest.  The  basic  algorithm  for  the 
PCA  method  used  is  summarized  for  completeness  in  Appendix  A. 

Up  to  now,  we  have  completely  defined  the  relationship  between  £  and 
S  =  {s1; . . . ,  SjvG}-  Given  the  distribution  of  £,  it  is  straightforward  to  use 
Monte  Carlo  method  to  fold  the  distribution  of  S,  p( S).  Then  computing 
p( Y)  requires  to  compute  the  conditional  p(Y|S)  where  Y  =  {y i,  y2, . . . ,  Vng}- 
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Indeed,  we  can  write  the  following: 


P(  Y) 


p(Y\Mm 


p(YlS)p(S\€)p(e)dSd( 


=  M  Y|S) 


c/S 


=  /  p(Y|S)p(S)dS 


/  p(?/l?  2/2 ?  •  •  •  ?  1^1)  •  •  •  5 


(13) 


As  discussed  in  Section  1,  probabilistic  graphical  models  [4]  can  be  used  to 
systematically  explain  the  probabilistic  relationship  between  S  =  {si, . . .  ,sng} 
and  the  responses  Y  =  {yi,  y2, . . . ,  yNa}-  Their  joint  distribution  can  be  par¬ 
titioned  in  a  way  the  accounts  for  the  local  nature  of  the  dependence/correlation 
of  the  response  to  the  input  variables.  The  details  of  such  approach  are  in¬ 
troduced  in  Section  4. 


4.  Probabilistic  graphical  model 

We  are  given  a  number  of  realizations  of  the  localized  reduced  input 
random  variables  S  and  corresponding  responses  Y,  and  also  the  distribution 
of  S.  Our  first  objective  is  building  a  probabilistic  graphical  model  between  S 
and  Y,  based  on  the  given  set  of  realizations.  We  next  plan  to  use  an  inference 
algorithm  on  the  probabilistic  graph  to  address  uncertainty  quantification 
problems. 

4-1.  Brief  introduction  to  probabilistic  graphical  models 

A  graphical  model  aims  to  represent  the  joint  probability  distribution 
of  many  random  variables  efficiently  by  exploiting  factorization  [4],  The 
two  most  common  forms  of  graphical  models  are  directed  graphical  models 
and  undirected  graphical  models,  based  on  directed  graphs  and  undirected 
graphs,  respectively.  The  dependence  relationship  is  visible  directly  from 
the  graph  for  the  directed  graph  model,  while  the  dependence  relationship 
is  hidden  in  the  undirected  graph.  I11  this  work,  the  dependence  relation¬ 
ships  between  the  response  random  variables  are  not  clear,  so  we  focus  on 
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the  undirected  graph,  which  is  also  called  pairwise  Markov  Random  Field 
(MRF)  [11]. 

Let  Q(y,  S)  be  an  undirected  graph,  where  V  are  the  nodes  (random  vari¬ 
ables)  and  £  are  the  edges  of  the  graph  (correlations).  Let  {Xv  :  Xi  G  V}  be 
a  collection  of  random  variables  indexed  by  the  nodes  of  the  graph  and  let 
C  denote  a  collection  of  cliques  of  the  graph  (i.e.,  fully  connected  subsects 
of  nodes).  Associated  with  each  clique  c  G  C,  let  0C(XC)  denote  a  nonnega¬ 
tive  potential  function,  which  implicitly  encodes  the  dependence  information 
among  the  nodes  within  the  clique.  The  joint  probability  p(X v)  is  defined 
by  taking  the  product  over  these  potential  functions  and  normalizing, 

p(w)=|r ]>(w),  (Mi 

cGC 


where  Z  is  a  normalization  factor. 

The  graphical  model  representation  makes  the  inference  problem  eas¬ 
ier.  The  general  algorithm  of  probabilistic  inference  is  that  of  computing 
the  marginal  probability  p(X^)  or  conditional  probability  p(X^|Xo),  where 
V  =  O  U  TC  for  given  subsets  O  and  hi.  The  belief  propagation  algorithm 
(inference)  is  then  applied  to  find  the  marginal  or  conditional  probabilities 
of  interest.  Notice  if  an  event  {Xq  =  x^}  is  observed,  the  original  clique 
potentials  need  to  be  modified,  that  is,  for  {JQ,  %  e  O},  we  multiply  the  po¬ 
tential  (j)c(Xc )  by  the  Kronecker  Delta  function  Sx,,  (x?;)  for  any  clique  c  G  C 
such  that  {i  G  c  fl  O},  where  x,:  is  the  observation  of  node  i.  The  detailed 
inference  algorithm  will  be  discussed  in  Section  5. 

4-2.  The  structure  of  the  graph 

To  find  an  efficient  structure  of  the  graph,  let  us  start  from  the  joint 
distribution  of  (Y,  S,  £), 

p(Y,S,0=p(Y|S)p(S|0p(£).  (15) 

The  above  decomposition  is  based  on  the  assumption  that  S  contains  all 
information  £  contains  for  the  calculation  of  Y.  The  probabilistic  relationship 
between  S  and  £  is  discussed  in  Section  3.  Each  variable  s;  6  S  has  a 
deterministic  relationship  with  £,  therefore,  all  the  Sj’s  should  be  directly 
linked  with  The  correlation  among  the  s,  variables  is  then  reflected  via 
their  connections  with  The  corresponding  structure  between  S  and  £  is 
given  in  Fig.  3. 
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Vi,  n, 


Figure  3:  The  general  graph  structure  for  the  problem  of  interest.  The  y  variables  represent 
the  response  of  the  system  (velocities  and/or  pressure  on  a  coarse-grid),  £  represents  the 
reduced  set  of  random  variables  defining  the  random  permeability  over  the  whole  domain 
D  and  s,  is  the  reduced  set  of  random  variables  defining  the  random  permeability  on  the 
patch  of  coarse-elements  that  share  the  coarse-node  i. 

To  find  the  structure  between  Y  and  S,  we  need  to  find  an  approximate 
decomposition  ofp(Y|S)  in  Eq.  (13), 

P(Y|S)  =p(y1,y2,...,yNG\s1,s2,...,sNG).  (16) 

In  this  work,  we  consider  only  the  pairwise  correlations  between  the  re¬ 
sponse  random  variables  y,  among  which,  only  correlations  between  neigh¬ 
boring  response  variables  are  considered.  We  further  assume  that  these  cor¬ 
relations  do  not  depend  on  the  local  features.  So  the  conditional  distribution 
p(Y|S)  can  be  decomposed  as 

ng 

p(y|s)  «  n^(^lSi’ s2>  ■  ■  ■  > sng)  Yl  P(Vi,yj),  (17) 

i=l  jeT(i) 

where  T(z)  denotes  the  set  of  neighboring  nodes  of  node  i. 

Remark  2:  This  decomposition  is  inspired  by  the  general  treatment  to 
the  conditional  random  field  representation  of  a  Gibbs  distribution,  where 
in  principle  the  explicit  expansion  of  the  conditional  distribution  involves 
one-body  term  and  two-body  interaction  terms  to  n-body  interaction  term. 
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Iii  practice,  we  ignore  higher-order  interactions  and  only  keep  the  first  two 
terms  [22,  23].  In  [15],  the  authors  also  apply  a  similar  idea  in  factorizing 
the  complex  conditional  probability  distribution,  however  they  assume  that 
the  correlations  between  the  physical  responses  are  also  dependent  explicitly 
on  property  features. 

To  further  simplify  the  dependencies  in  the  factorization  of  p(Y|S),  we 
further  assume  that  the  response  at  one  coarse-grid  node  is  strongly  depen¬ 
dent  on  the  inputs  in  its  underlying  nearest  coarse-elements,  thus  the  influ¬ 
ence  by  all  other  inputs  is  ignored.  In  other  words,  we  are  assuming  that  t/i 
only  depends  on  its  underlying  localized  reduced  input  Sj.  This  is  in  analogy 
to  various  multiscale  methods  (e.g.  the  MsFEM  method  [24])  where  in  the 
calculation  of  the  local  multiscale  basis  functions  only  the  local  permeability 
is  considered.  Hence,  the  above  equation  can  be  further  decomposed  as 

ng 

?(yis)  « n  p(yi\si)Y[p(yi,yj)-  (18) 

i=  1  i,j 

The  constructed  structure  of  the  undirected  graph  is  shown  in  Fig.  3.  If 
we  write  Eq.  (18)  as  a  product  of  potential  functions  in  the  graphical  model, 
we  can  obtain 


p(Y|S)  oc  JJ  (pk(yk,  8*)  JJ  il> ij(yi,Vj )•  (19) 

fcev(y)  (i,j)e£(«) 


There  are  two  kinds  of  potential  functions  in  Eq.  (19),  one  is  pair  potential 
functions,  -0(*),  that  model  the  correlation  between  neighboring  response 
variables,  the  other  one  is  the  cross  potential  functions,  <£>(*),  which  interprets 
the  relationship  between  the  reduced  input  variables  s,  and  response  variables 
Hi.  In  the  following,  we  denote  with  the  set  of  the  localized  reduced 
input  nodes  (coarse-grid  nodes),  and  with  the  set  of  the  edges  between 
the  response  variables  (edges  of  the  coarse-elements). 

In  this  work,  the  potential  functions  between  s,;  and  £  are  difficult  to 
model  due  to  their  high  dimensionality  nature.  However,  S  and  £  are  explic¬ 
itly  known  to  us,  so  is  the  relationship  between  them.  Therefore,  we  do  not 
have  to  learn  these  potential  functions.  In  Section  5,  we  will  discuss  how  we 
perform  the  inference  problem  without  using  the  potential  functions  between 
s,;  and 
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4-3.  Learning  the  graphical  model 

Since  we  are  considering  a  nonparametric  graphical  model,  the  potential 
functions  should  have  Gaussian  mixture  forms.  As  discussed  above,  there 
are  two  types  of  potential  functions,  the  pairwise  potential  function  for  the 
response  variables,  ^(*),  and  the  cross  potential  function  for  response  vari¬ 
ables  and  localized  reduced  input  variables,  as  given  in  Eq.  (19).  In 

this  work,  both  of  the  potential  functions  are  designed  to  have  the  following 
form, 

M 

AM,  z,)  =  V  MW  ((*,  Zj);  E(m)) .  (20) 

m=l 


where  z%  and  z3  denote  the  random  variables  on  node  i  and  node  j  and 
A /"(•)  is  the  Gaussian  distribution.  The  unknown  parameters  in  the  above 
potential  functions  are  {u/m\  pSm\  m  =  1  ,...,M},  where  o/m)  is  the 
weight  (scalar)  for  component  m,  /i-"h  is  the  mean  for  component  m  (the 
size  of  the  mean  vector  is  equal  to  the  sum  of  dimensions  of  Zj  and  Zj ),  and 
E(”)  is  the  covariance  matrix. 

In  this  work,  the  unknown  parameters  in  the  potential  functions  are 
learned  by  maximizing  the  log- likelihood.  Denote©  =  {Oi,j  :  (i,j)  G  and  0*  :  i  G 

as  the  set  of  all  the  unknown  parameters,  where  dt  ]  =  | ,  //-j  \  E)-”,'-) ;  rn  =  1, . . . ,  M  j 

and  6i  =  ju n[m\  m  =  1, . . . ,  M  j .  These  parameters  can  be  calcu¬ 
lated  locally  in  the  coarse-grid.  For  specific  i,j  such  that  i  G  V(y}  and 
(i,j)  G  £(y\  let  us  consider  given  N  observations  of  {(s-n\  s^),  (y-n\  y^)} 
for  n  =  1 , ,N.  The  log-likelihood  can  then  be  calculated  as, 


N 


n=  1 


E  hogptoh.shie.)  +  iogp(»,(n),</i”V»j 


(21) 


By  maximizing  the  log-likelihood,  we  obtain 


(Oi,  0,.  j) 


ai  g  maxw  w 


(22) 

Notice  that  maximizing  the  log-likelihood  is  equivalent  to  maximizing 
each  component  of  Eq.  (21)  separately,  therefore,  the  graph  learning  problem 
can  be  divided  into  a  number  of  local  learning  problems.  For  example,  to 
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learn  6i:  we  only  need  to  maximize  ^2^=i^ogp(y^n\s[n‘l\Gi)  using  the  local 
training  data  set  {y\n\  s)-n') ;  i  =  1, ...  ,N}. 

Remark  3:  The  parameters  dehne  the  correlation  between  the  response 
variables,  whereas  6,  interpret  the  dependence  relation  between  a  response 
variable  and  its  underlying  localized  reduced  random  input.  Both  of  these 
parameters  are  computed  locally  using  the  training  data.  Thus  the  compu¬ 
tational  cost  affiliated  with  the  estimation  of  the  parameters  that  dehne  the 
probabilistic  dependencies  in  the  graph  is  minimal.  Note  that  the  approach 
used  here  is  different  from  that  in  [15]  where  local  estimation  problems  are 
only  posed  to  compute  the  dependencies  on  the  input  permeability  permeabil¬ 
ity  of  the  local  potentials.  In  this  work,  the  effect  of  the  input  permeability 
is  introduced  via  the  local  random  variables  s*  and  the  potentials  considered 
are  Gaussian  mixtures  with  unknown  parameters.  The  potentials  in  [15]  are 
simple  Gaussians.  In  general  case,  all  the  Qi.j  and  0,  are  different  across  the 
graph  (i.e.  for  different  i  and  j). 

Remark  4:  For  a  stationary  permeability  case,  the  variables  s,  follow  the 
same  distribution  but  this  does  not  imply  that  6,  are  the  same  parame¬ 
ters.  Taking  simultaneous  realizations  of  Sj  and  s j  leads  to  different  lo¬ 
cal  permeability  realizations  and  thus  different  response  fields  y ^  and 

n  =  1, . . . ,  N .  In  the  calculation  of  6, ,  the  training  data  set  {yf  '\  sjn\  i  = 
1, ...  ,N}  that  we  use  vary  with  the  location  i  and  therefore  6i  differs  with 
location.  Similar  argument  can  be  made  about  the  location  dependence  of 
the  parameters  d,h3. 

The  Expectation  Maximization  (EM)  algorithm  is  chosen  to  maximize 
the  local  log-likelihood.  The  details  of  the  EM  algorithm  are  given  in  Ap¬ 
pendix  B.  Note  that  in  the  EM  algorithm  employed,  the  number  of  mixture 
components  M  is  predefined.  A  discussion  of  how  to  choose  M  is  provided 
in  Section  6.1. 

5.  Inference  problem 

The  general  inference  problem  in  a  graphical  model  is  to  find  the  marginal 
or  conditional  probabilities  of  interest  in  the  graph.  This  task  is  usually 
performed  by  using  the  belief  propagation  (BP)  algorithm  [4],  In  this  work, 
after  all  the  underlying  parameters  in  the  graph  are  successfully  learned,  the 
inference  problem  can  be  performed.  In  the  following,  we  first  provide  a  brief 
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introduction  to  the  general  belief  propagation  algorithm,  and  then  we  discuss 
in  detail  how  to  apply  the  belief  propagation  algorithm  into  our  framework. 

5.1.  General  belief  propagation 

Belief  propagation  (BP)  is  a  general  inference  algorithm  for  graphical 
models.  In  the  BP  algorithm,  each  node  k  iteratively  solves  the  global  in¬ 
ference  problem  by  integrating  information  from  the  local  environment,  and 
then  transmits  a  summary  message  to  all  its  neighbors  /  G  r(/c)  along  the 
edges,  where  T (k)  denotes  all  the  neighboring  nodes  of  node  k.  The  infor¬ 
mation  flow  during  this  process  is  called  the  message,  or  belief,  which  is  a 
function  containing  sufficient  information  of  the  “influence”  that  one  variable 
exerts  on  another. 

The  BP  algorithm  begins  by  randomly  initializing  all  messages  myk(yj), 
where  myk(yj)  denotes  message  from  node  yk  to  node  yj,  and  then  updating 
the  messages  along  each  edge  according  to  the  following  recursion  [11,  4],  as 
shown  in  Fig.  4(b): 


mvk(yj)  °c  /  ^(yk,Vj)  JJ  myi(yk)dyk. 
J  ier(k)\j 


(23) 


Figure  4:  Message-passing  recursions  underlying  the  BP  algorithm:  (a)  Approximate 
estimate  of  the  marginal  distribution  p(yk)  (Eq.  (24))  combines  the  local  input  information 
with  messages  from  neighboring  nodes;  (b)  A  new  outgoing  message  rnVk(yj)  from  node 
yt  towards  neighboring  node  yj  is  computed  from  all  other  incoming  messages  to  node  yk 
except  node  yj,  as  in  Eq.  (23). 

Generally,  the  marginal  distribution  of  each  node  (e.g.,  node  k)  given  the 
observation  can  be  computed  by  gathering  all  the  messages  coming  into  that 
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node  [11,  4],  as  shown  in  Fig.  4(a): 

p(yk)  «  Yl  mvi(yk)-  (24) 

zer(fc) 

5.2.  Inference  approach  for  the  problem  of  interest 

In  this  section,  suppose  p(£)  is  known.  The  objective  is  then  to  find  the 
marginal  distribution  of  p( Y)  via  the  belief  propagation  algorithm.  In  the 
particular  problem  of  interest,  there  are  three  main  challenges  with  regards  to 
the  inference  algorithm:  (1)  how  to  represent  and  calculate  the  message  from 
£  to  S  (discussed  in  Section  5.2.1),  (2)  how  to  treat  the  loops  in  the  graph 
(discussed  in  Section  5.2.2),  and  (3)  how  to  calculate  the  message  update  in 
the  form  of  a  Gaussian  mixture  (discussed  in  Section  5.2.3). 


Figure  5:  Message  flow  in  the  present  graphical  model  framework.  We  assume  a  two- 
dimensional  response  with  response  variables  (velovity  components/pressure  indicated  by 
the  blue  nodes). 

5.2.1.  Detailed  inference  algorithm 

The  illustration  of  the  message  flow  in  the  current  graphical  model  frame¬ 
work  is  given  in  Fig.  5.  Note  that  in  this  two-dimensional  framework,  we  iden¬ 
tify  our  nodes  with  two  indices  (i,  j)  rather  than  the  single  indices  1,2,,...,  Nq 
used  in  our  earlier  analysis.  There  are  four  types  of  messages  in  the  figure, 
(a)  message  from  a  response  node  y^,i)  1°  its  neighboring  response  node  t/py), 
mV(k  (Eq-  (25)  below),  (b)  message  from  a  response  node  y(i.j)  to  its 


17 


neighboring  localized  reduced  input  node,  rriy(i  f)  (spj))  (Eq.  (26)  below),  (c) 
message  from  a  localized  reduced  input  node  to  its  neighboring  response 
node  y(i.j),  m,S(i  f)  {y(i.j))  (Eq.  (27)  below),  and  (d)  message  from  the  global 
reduced  input  node  £  to  spj),  m^( S(i,j))  (calculated  by  Eq.  (29)  shown  below). 

Following  Eq.  (23),  the  message  from  a  response  node  y^,i)  to  its  neigh¬ 
boring  response  node  y(i,j)  can  be  calculated  as: 

y(P,q)^r(-yHy(k,i))\v(i,j) 

(25) 

where  T^Ay^q)  \  y(i,j)  denotes  all  the  nearest  neighboring  response  variables 
to  y(k,i)  excluding  y^j),  and  s^ti)  is  the  corresponding  reduced  input  repre¬ 
sentation  that  locally  influences  the  response  variable  y(k,i)-  ms{k  is 

the  message  sent  from  to  y(k,i),  and  it  can  be  calculated  by  Eq.  (27) 
given  below. 

The  message  from  a  response  node  y(ij)  to  its  neighboring  localized  re¬ 
duced  input  node  su^  can  be  calculated  as: 

my(i,j)  (s(*j))  ^  J  LP{i,j)(y(.,i;j)i  s(*j))  my(_k1i)  (y(i,j))dy(i,j)-  (^s) 

Denote  the  message  sent  from  £  to  the  spj)  variable  as  As¬ 

suming  that  this  message  is  known  now  (it  will  be  discussed  next),  then  the 
message  from  a  localized  reduced  input  node  S(j  n  to  its  neighboring  response 
node  ypj)  can  be  calculated  as: 

mS(ij)(y(i,j))  ^  J  V{i,i){y{i,j)i  S(i,j))m€(S(i,j))dS{i,j)-  (^7) 

Lastly  we  discuss  the  task  of  calculating  the  message  from  £  to  spj).  It 
is  hard  to  obtain  message  update  from  Eq.  (23),  because  both  £  to  spj)  are 
high  dimensional,  and  in  addition  we  need  to  calculate  the  messages  from  all 
the  other  s  variables  except  spj)  to  £.  To  bypass  these  difficulties,  we  use  a 
different  way  to  construct  the  unknown  message  from  the  information  that 
is  already  known  to  us  as  follows.  According  to  Eq.  (24),  we  can  write  the 
following: 


(28) 
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Since  S  are  known  to  us,  we  know  exactly  what  p( sq^))  is,  and  also  we 
know  my{.  (spj))  from  Eq.  (26).  Then  we  can  write: 


mc(s(iJ))  oc 


P(s(»j)) 


(29) 


As  a  result,  the  message  sent  from  £  to  is  updated  using  the  known 
marginal  distribution  of  spj)  and  my(ij)( spj)).  In  this  work,  this  message  is 

calculated  by  sampling  from  — -  using  the  Metropolis  Hastings  algo- 

rithm  [25].  The  details  of  how  to  calculate  m^( spy))  are  given  in  Appendix  C. 

Remark  5:  If  a  realization  of  the  stochastic  input,  a,  is  given,  then  there  is 
no  need  to  calculate  Eqs.  (26)  and  (29)  because  S  and  £  can  be  exactly  known 
from  the  model  reduction  scheme.  The  message  from  ^  to  l/py),  as  given  in 
Eq.  (27),  can  be  calculated  as:  rnS(iJ)(y{iJ))  oc  f  <P(ij)(y(ij),  S(ij))Sa(iJ)  (sj^ds^), 

where  the  Delta  function  only  takes  value  when  equals  to  the  given  re¬ 
alization. 

After  the  belief  propagation  algorithm  is  completed,  we  obtain  the  marginal 
distribution  of  the  physical  responses  conditioned  on  the  given  input,  e.g. 
p(y(i,j) |a).  Let  the  expectation  E[j/(jj)|a]  be  the  predicted  values  of  the  phys¬ 
ical  responses  and  the  variance  be  the  corresponding  error  bars.  We  thus 
obtain  a  surrogate  model  based  on  the  graphical  model  that  for  an  any  input 
realization  provides  us  the  response  of  the  system  as  well  as  our  confidence 
on  this  prediction. 

In  the  uncertainty  quantification  problem,  one  exerts  a  known  distribu¬ 
tion  on  the  input  A,  p( A),  and  is  interested  to  quantify  the  probability 
induced  by  it  on  the  response.  Using  the  model  reduction  techniques  dis¬ 
cussed  in  Section  3,  we  can  explicitly  compute  the  distribution  of  £  and  S, 
p(£)  and  p( S),  respectively,  given  p( A)  or  a  set  of  realizations  of  A.  Then,  by 
executing  the  inference  problem  discussed  above,  we  can  obtain  an  explicit 
representation  of  the  marginal  distribution  of  all  the  responses  by  gathering 
all  the  messages  coming  from  the  neighbors  (as  in  Eq.  (24)).  The  statistics 
of  interest  can  be  calculated  directly  from  the  marginal  distribution. 


5.2.2.  Loopy  belief  propagation  (LBP) 

For  graphs  with  loops,  there  are  two  major  types  of  treatment.  The  first 
approach  is  to  break  the  loops,  by  either  cutting  the  cycles  or  finding  an 
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equivalent  surrogate  graph,  such  as  in  the  graph  cut  [26]  or  the  junction  tree 
algorithms  [27].  The  second  approach  is  to  find  an  efficient  way  to  recursively 
update  the  messages  until  convergence.  Although  until  now  there  is  no  strict 
mathematical  justification  that  the  loopy  belief  propagation  converges  to  the 
true  marginals,  in  many  applications,  the  resulting  LBP  algorithm  exhibits 
excellent  performance  [28,  29,  30,  31].  Recently,  several  theoretical  studies 
have  provided  insights  into  the  approximations  made  by  LBP,  establishing 
connections  to  other  variational  inference  algorithms  and  partially  justifying 
its  application  to  graphs  with  cycles  [32,  33].  In  this  work,  we  simply  follow 
the  second  approach,  by  finding  an  efficient  way  to  calculate  all  the  messages 
in  one  iteration.  The  complete  procedure  is  given  in  Algorithm  1  and  depicted 
in  Fig.  6.  The  messages  are  considered  as  converged  if  their  change  is  less 
than  a  threshold  in  two  successive  iterations  as  in  Eq.  (30). 


Figure  6:  Illustration  of  the  Loopy  Belief  Algorithm  used  in  this  paper,  (a)  Message  flow 
from  the  root  node  to  the  end  node,  as  in  Step  2.1.b  in  Algorithm  1.  (b)  Message  flow 
from  the  end  node  to  the  root  node,  as  in  step  2.1.c  in  Algorithm  1. 


5.2.3.  Nonparametric  belief  propagation 

In  the  nonparametric  graphical  model,  each  message  is  represented  by 
a  Gaussian  mixture.  Then  the  belief  update  (Eqs.  (25)-(27))  becomes  an¬ 
alytically  intractable.  Currently  there  are  two  possible  approximations  for 
performing  the  belief  update.  The  first  one  is  using  a  variational  method  [34], 
The  basic  idea  of  the  variational  method  is  using  a  much  simpler  form  (user 
defined)  to  obtain  an  approximation  that  is  as  close  as  possible  to  the  target 
message.  The  second  approach  is  using  a  sampling  method  [13].  The  idea  of 
the  sampling  method  comes  from  particle  filters.  In  [14],  it  was  extended  to 
graphs  containing  continuous,  uou-Gaussian  variables  leading  to  the  so  called 
“Nonparametric  belief  propagation  (NBP)  method”.  In  this  work,  we  utilize 
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Algorithm  1  The  complete  inference  algorithm 
1:  Initialization:  With  given  p(£)  and  the  deterministic  relationship  between 
S  and  £,  p( S)  can  be  obtained  via  MC  method  as  discussed  in  Section  3. 
We  set  the  initial  message  as  m^\ S(ij))  =  p(s(i,j)),  and  all  other  messages 
as  a  standard  Gaussian  A/"(0, 1). 

2:  Iterate:  At  step  t, 

(1)  Update  as  in  Eq.  (25), 

a)  Set  one  node  as  the  root  node,  and  another  node  as  the  end 
node. 

b)  Calculate  all  the  messages  from  the  root  node  to  the  end  node, 
as  shown  in  Fig.  6(a). 

c)  Calculate  all  the  messages  from  the  end  node  to  the  root  node, 
as  shown  in  Fig.  6(b). 

(2)  Update  d)  (spj))  as  in  Eq.  (26). 

(3)  Update  m^\ spj))  as  in  Eq.  (29). 

(4)  Update  tj)  (y&j))  as  in  Eq.  (27). 

3:  Convergence:  the  algorithm  stops  when, 


e  = 


N, 


y(») 


£ 


gy(«) 


i/(fc>l)e r(2/CiJ)) 


( y{i,j )) 


JV(k,l) 


(y«j)) 


(t~i) 


(30) 

where  vy(k  t)  (y(i,j))®  denotes  the  estimated  variance  of  the  message  from 
the  neighboring  node  y^,i)  of  y(ij)  to  node  V(ij),  which  can  be  cal¬ 


culated  as  u„(fci0(j/(ij))(t)  =  E™=i(4 


(m) 

V{k,l)~ 


(m) 

>y(i,j)ay(k,i)- 

and  a. 


(m) 


(see  Ap- 
are  the 


Pendix  D  for  the  proof),  where  ujy(kl)^y{iJ)  auu 
weight  and  variance  for  component  (m)  in  the  representation  of  message 


m 


y(k,l)(V(i,j ))  Em=l4(M)- 


(  n(m)  1 rr(m) 

-/v  '  ry(k,i)^y(i,j)i  \uy(k,i)^y(' 


vn)2)- 


Notice 


that  in  this  work,  all  the  messages  are  normalized  after  being  updated. 
The  subscript  t  denotes  the  step  number. 
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the  NBP  algorithm  to  perform  the  inference  problem.  Specifically,  we  use  the 
NBP  algorithm  to  approximately  find  the  update  of  Eqs.  (25),  (26)  and  (27) 
(corresponding  to  Steps  2.1,  2.2,  and  2.4  in  Algorithm  1,  respectively).  In  the 
following,  we  clearly  demonstrate  how  to  use  the  NBP  algorithm  to  compute 
the  message  update  in  Eq.  (25).  The  message  update  in  Eqs.  (26)  and  (27) 
is  similar. 

The  NBP  algorithm  approximates  the  belief  update  Eq.  (25)  using  a 
sampling  method  [14].  It  circumvents  sampling  directly  from  Eq.  (25)  (which 
is  rather  difficult  task)  by  decomposing  the  process  into  two  steps.  In  the 
first  step,  we  draw  N  independent  samples  ^  from  a  partial  belief  estimate 
combining  the  marginal  influence  function  of  the  pairwise  potential  function 
'ip{y(i,j),V(k,i))  on  ?/(fc)z),  cross  potential  <p(k,i)(y(k,i),  and  all  the  other 

incoming  messages  (Eq.  (32)).  The  marginal  influence  function  C (2/(fc,z))  is 
defined  by 

C(y(k,i))  j  y(k,i))dy(i  j).  (3i) 

In  this  work,  ^(ypj),  2/(fc,o)  is  a  Gaussian  mixture,  so  C(z/(ifc,o)  is  simply  the 
Gaussian  mixture  obtained  by  marginalizing  each  component. 

In  the  second  step,  for  each  of  these  auxiliary  particles  we  make 

samples  from  the  normalized  conditional  potentials  proportional  to 

^(ypj),  y(k,i)  =  y{k\))  (Eq.  (33)).  The  detailed  algorithm  of  NBP  is  sum¬ 
marized  in  Algorithm  2. 

Remark  6:  Since  we  are  using  a  Gaussian  mixture  to  represent  all  messages, 
an  inevitable  problem  will  arise  with  the  increase  of  the  number  of  mixture 
components.  For  example,  assume  that  all  the  messages  are  M-component 
Gaussian  mixtures,  and  the  BP  belief  update  of  Eq.  (25)  is  defined  by  a 
product  of  h  mixtures.  The  product  of  h  Gaussian  mixtures,  each  containing 
M  components,  is  itself  a  mixture  of  Mh  Gaussian  distributions.  While 
in  principle  this  belief  update  could  be  performed  exactly,  the  exponential 
growth  in  the  number  of  mixture  components  quickly  becomes  intractable. 
Therefore,  in  this  work,  we  use  Gaussian  mixture  reduction  via  clustering 
(GMRC)  method  to  reduce  the  number  of  mixture  components  whenever 
the  number  of  mixture  components  of  the  beliefs  exceed  M.  The  details  of 
GMRC  algorithm  are  given  in  Appendix  E. 
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Algorithm  2  The  detailed  algorithm  for  nonparametric  belief  propagation 

Given:  Input  message  mV(M)  {y{k,i))  =  {4™L)^/Cm)>  4r™!9)-n/(k,o  > 
for  each  y(p,q)  £  T (jj(k,i))  \  V(i,j)- 
Objective:  Construct  an  output  message  my(k  (y(ij)). 

1:  Determine  the  marginal  influence  C(?/(fc,o)  by  Eq.  (31). 

2:  Draw  N  independent,  weighted  samples  from  the  product, 

~  C(y{k,i))v{k,i){y(k,i),  s(k,i))  my(P,q)(y(k,i))-  (32) 

y(p>q)  (y  (k  ,i))\y  (i  ,j) 

GMRC  (A  Gaussian  mixture  reduction  technique  discussed  in  Ap¬ 
pendix  E)  is  first  adopted  to  reduce  the  components  of  the  product  and 
then  exact  sampling  method  [35]  is  applied. 

3:  For  each  y^\y  sample  from, 


~(n) 

y\ii) 


^(2/bjb  V(k,i)  y[k]i))- 


(33) 


4:  Construct  mv 


lV(k,i)bj{i,j))  from 

of  message  m^^y^)). 

'M  .  (m)  A/7i/im)  v(m) 

\H-y(k,i)^y(ij)i  z-‘y(k,i) 


by  taking  y(("j)  as  realizations 


oc 


•tr^M  ( 
2^m=  1  UV 


)-*(«)’  Ev?k,i)->y(ij)}?Lv  These  can  be  learned  using  the 
EM  algorithm  discussed  in  Appendix  B  by  taking  yj”])  as  training  sam¬ 
ples. 


,(m) 


Specihcally,  assume  my(kl){y(i,j)) 

i  f) ) ,  where  the  unknowns  are 


23 


6.  Numerical  examples 

In  this  paper,  we  construct  a  probabilistic  graphical  model  to  study  two- 
dimensional,  single  phase,  steady-state  fluid  flow  through  random  heteroge¬ 
neous  porous  media.  A  review  of  the  mathematical  models  of  flow  through 
porous  media  can  be  found  in  [36] .  The  spatial  domain  D  is  chosen  to  be  the 
unit  square  [0,  l]2,  representing  an  idealized  oil  reservoir.  Let  us  denote  with 
p  and  u  the  pressure  and  the  velocity  fields  of  the  fluid,  respectively.  These 
are  connected  via  the  Darcy  law: 

u  =  — KVp,  in  D,  (34) 

where  K  is  the  permeability  tensor  that  models  the  easiness  with  which  the 
liquid  flows  through  the  reservoir.  Combining  the  Darcy  law  with  the  con¬ 
tinuity  equation,  it  is  easy  to  show  that  the  governing  PDE  for  the  pressure 
is: 


— V  •  (KVp)  —  f,in  D,  (35) 

where  the  source  term  /  may  be  used  to  model  injection/production  wells.  In 
this  example,  we  consider  square  wells:  an  injection  well  on  the  left-bottom 
corner  of  D  and  a  production  well  on  the  top-right  corner.  The  particular 
mathematical  form  of  the  source  term  /  is  as  follows: 


f  -r, 

if  \xi  —  \w\  <  for  7  =  1,2, 

/(x)  =  <  r, 

if  \xi  —  1  +  <  |w,  for  7  =  1,2, 

(36) 

l  o, 

otherwise, 

where  r  specifies  the  rate  of  the  wells,  w  their  size  (chosen  here  to  be  r  =  10 
and  w  =  1/8),  and  x  =  ( Xi,X2 )  G  D.  Furthermore,  we  impose  no-flux 
boundary  conditions  on  the  walls  of  the  reservoir: 

u  •  n  =  0,  on  dD,  (37) 


where  n  is  the  unit  normal  vector  to  the  boundary.  These  boundary  condi¬ 
tions  specify  the  pressure  p  up  to  an  additive  constant.  To  assure  uniqueness 
of  the  boundary  value  problem  defined  by  Eqs.  (34),  (35)  and  (37),  we  impose 
the  constraint  [37]: 


/  p(x)dx  =  0. 
J  D 
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The  boundary  value  problem  is  solved  using  a  mixed  finite  element  formu¬ 
lation.  We  use  first-order  Raviart-Thomas  elements  for  the  velocity  [38] ,  and 
zero-order  discontinuous  elements  for  the  pressure  [39].  The  permeability  is 
defined  on  a  64  x  64  fine-grid  and  we  are  interested  in  the  physical  responses 
on  a  8  x  8  coarse-grid.  The  solver  was  implemented  using  the  Dolfin  C++  li¬ 
brary  [40].  The  eigenfunctions  of  the  exponential  random  field  used  to  model 
the  permeability  were  calculated  via  Stokhos  which  is  part  of  Trilinos  [41]. 

In  this  work,  the  final  responses  taken  into  consideration  include  x— velocity, 
ux,  y— velocity,  uy  and  pressure,  p.  We  assume  independence  of  the  multi¬ 
ple  output  responses  so  we  can  build  an  independent  graphical  model  for 
each  of  them.  This  is  typical  of  many  uncertainty  quantification  methods 
but  methodologies  where  correlations  are  accounted  can  be  considered  as 
well  [15].  As  previously  discussed,  the  constructed  graphical  model  is  a  non- 
parametric  model.  Then  naturally  an  important  question  arises  as  to  what 
the  proper  number  is  of  the  mixture  components  considered.  One  should 
avoid  to  choose  a  large  number  due  to  the  exponential  increase  for  the  com¬ 
putational  cost,  especially  at  the  loop  belief  propagation  step.  A  large  num¬ 
ber  of  mixture  components  does  not  necessarily  lead  to  better  results  and  for 
some  cases  may  lead  to  over-fitting.  In  the  context  of  the  examples  presented 
below,  three  mixture  components  were  shown  to  provide  an  adequate  choice 
for  the  accuracy  desired. 

6.1.  Stationary  random  field 

In  this  example,  the  log-permeability  is  considered  as  a  stationary  random 
field.  We  restrict  ourselves  to  an  isotropic  permeability  tensor: 


K%3  =  KSij. 


K  is  modeled  as 


K(x)  =  exp{G(x)}, 


where  G  is  a  Gaussian  random  field: 


G(-)  ~  A f(m,cG(-,-)), 


with  constant  mean  m  and  an  exponential  covariance  function  given  by 


(38) 
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The  parameter  l  represents  the  correlation  lengths  of  the  field,  while  vq  >  0 
is  its  variance.  In  order  to  obtain  a  finite  dimensional  representation  of  G,  we 
employ  the  Karhunen-Loeve  expansion  [16]  and  truncate  it  after  terms: 


h 

G(£;x)  =  m  +  (39) 

k=  1 

where  £  =  (£i,  ■  ■  ■  ,  £/t5)  is  a  vector  of  independent,  zero  mean  and  unit  vari¬ 
ance  Gaussian  random  variables  and  </>fc(x)  are  the  eigenfunctions  of  the 
exponential  covariance  given  in  Eq.  (38)  (suitably  normalized). 

The  values  we  choose  for  the  parameters  are  m  —  0,  l  —  0.1  and  vq  =  1  in 
Eq.  (38),  and  k ^  =  50  in  Eq.  (39).  In  the  following,  we  first  verify  the  model 
reduction  framework  in  Section  6.1.1  and  then  we  move  to  the  inference 
tasks  on  the  graph:  1)  given  the  input  distribution  of  £,  investigate  how 
the  uncertainty  propagates  to  the  response  in  Section  6.1.2;  2)  given  a  new 
permeability  field,  find  the  prediction  of  unobserved  responses  with  proper 
error  bars  in  Section  6.1.3. 

6.1.1.  Model  reduction 

As  discussed  in  Section  3,  PCA  model  reduction  technique  is  applied  to 
reduce  the  dimensionality  of  the  input  permeability  field.  Fig.  7  shows  the 
normalized  eigen-plot  and  energy-plot  for  the  PCA  reduction  for  the  input 
permeability  over  the  corresponding  coarse-elements  to  two  random  coarse- 
nodes.  Here,  “normalized”  means  that  each  eigenvalue  is  divided  by  the 
sum  of  all  the  eigenvalues.  As  shown  on  these  plots,  by  using  less  than  ten 
eigenvectors,  the  cumulative  preserved  energy  is  almost  one,  which  means 
a  ten- dimensional  random  variable  representation  is  enough  to  describe  the 
original  data  set.  In  addition,  we  compare  the  reconstructed  input  perme¬ 
ability  field  with  the  original  one  (the  dimension  of  the  original  permeability 
field  is  64  x  64  =  4096)  with  different  k ,  where  k  is  the  dimensionality  of  the 
reduced  space,  as  shown  in  Figs.  8  and  9.  The  major  differences  occur  along 
the  coarse-grid  boundaries.  This  is  indeed  expected  by  using  the  reconstruc¬ 
tion  strategy  discussed  in  Section  3.  It  is  clear  from  these  figures  that  as  the 
reduced  dimensionality  k  increases,  the  reconstructed  permeability  field  gets 
closer  to  the  original  microstructure.  Also  as  the  number  of  training  data  N 
used  increases,  the  reconstructed  permeability  becomes  closer  to  the  original 
realization. 
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Normalized  Eigenvalue 


1 


1 


Figure  7:  Stationary  random  field  -  Normalized  eigenspectrum  and  energy  plot  for  the 
input  permeability  in  two  random  subdomains. 
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Figure  8:  Stationary  random  field  -  Comparison  of  the  reconstructed  input  permeability 
field  with  the  original  given  sample  (a)  with  different  number  of  training  data  for  k  =  10, 
where  k  is  the  dimensionality  of  the  reduced  space;  (b)(d)(f)  The  reconstructed  input 
permeability  using  N  =  200,1000,  and  4000  training  data,  respectively;  (c)(e)(g)  The 
error  between  the  reconstructed  permeability  field  and  the  original  sample. 
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Figure  9:  Stationary  random  field  -  Comparison  of  the  reconstructed  input  permeability 
field  with  the  original  given  sample  (a)  with  different  k  for  N  =  1000;  (b)(d)(f)  The 
reconstructed  input  permeability  using  k  =  5, 10,  and  30,  respectively;  (c)(e)(g)  The  error 
between  the  reconstructed  permeability  field  and  the  original  sample. 
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Finally,  we  compare  the  reconstruction  error  of  the  input  permeability 
held  with  different  number  of  training  data  and  different  reduced  dimension¬ 
ality  k  (Fig.  10).  The  reconstruction  error  is  computed  by 


e  = 


1 

Ayv 


^||AW  -  A(i)||2 


i= 1 


(40) 


where  A  =  [a i ,  a2, ....  a/vf] ,  Nf  is  the  number  of  elements  on  the  fine- mesh 
and  A  =  [sp,  a2, . . . ,  aNf  ]  where  a,  is  the  reconstructed  permeability  on  the 
fine-element  e*.  The  superscript  ( i )  denotes  the  i  —  th  sample  and  N  is  the 
total  number  of  samples  used.  The  given  figure  indicates  that  the  number  of 
reduced  dimensionality  k  has  a  higher  impact  on  the  final  performance  of  the 
reconstruction  than  the  number  of  training  data.  The  reduced  dimensionality 
k  is  chosen  as  10  in  this  problem. 


-1 

r 


Figure  10:  Stationary  random  field  -  Comparison  of  the  reconstruction  error  of  the  input 
permeability  field  with  different  number  of  training  data,  and  different  reduced  dimension¬ 
ality  k. 


6.1.2.  Uncertainty  propagation 

In  this  section,  we  are  going  to  investigate  how  the  uncertainties  propagate 
from  the  input  permeability  to  the  output  (velocity  and  pressure)  response. 
After  the  graphical  model  is  completely  learnt  by  the  training  data,  we  send 
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the  distribution  of  S,  p( S),  which  is  calculated  from  the  known  input  dis¬ 
tribution  p(£),  to  the  graphical  model  as  the  input  message,  and  then  run 
the  nonparametric  belief  propagation  algorithm.  After  all  the  messages  in 
the  graph  converge,  we  compute  the  marginal  distribution  of  the  response 
variables  by  combining  all  the  messages  coming  into  the  response  variable  as 
in  Eq.  (24). 

Fig.  11  compares  the  predicted  mean  of  ux  with  a  Monte  Carlo  estimate 
using  105  observations.  We  can  clearly  see  that  as  the  number  of  training  data 
increases,  the  prediction  gets  more  and  more  accurate.  The  same  statistic 
for  uy  and  p  is  reported  in  Figs.  12  and  13,  respectively. 

Fig.  14  compares  the  predicted  variance  of  ux  to  a  Monte  Carlo  estimate 
using  105  observations.  Also,  the  predicted  variance  converges  to  the  MC 
results  with  the  increase  of  the  number  of  the  training  data.  The  same 
statistic  for  uy  and  p  is  given  in  Figs.  15  and  16,  respectively.  We  can  see 
that  N  =  1000  training  samples  can  already  give  rather  accurate  predictions 
for  the  marginal  mean  and  variance  of  the  responses.  Notice  that  in  this 
work,  the  predicted  marginal  probability  is  given  in  a  Gaussian  mixture  form 
with  three  components.  For  example,  the  response  at  one  coarse-grid  node, 
U(i,j)  (random  variable)  can  be  represented  as  y(i.j)  =  X^m=i w^iere 

2/ (NO  ~  A s  discussed  in  Appendix  D,  the  first-order  and 

second-order  statistics  can  be  obtained  by  E[ypj)]  =  and 

Var[y(i,j)}  =  El iK(3)2(ff£))2.  respectively. 

The  error  of  the  statistics  is  evaluated  using  the  (normalized)  L2  norm  of 
the  error  in  variance  defined  by: 


El2  = 


N 


Ng 


N, 


G 


Ffa.MC  “  «i)2. 


(41) 


i— 1 


where  v^mc  is  the  Monte  Carlo  estimate  of  the  variance  of  the  response  on 
the  f-th  coarse-node  using  105  samples,  and  v%  is  the  predictive  variance  given 
by  the  graphical  model.  In  Fig.  (17),  we  plot  the  L2  norm  of  the  error  as  a 
function  of  the  number  of  samples  for  ux,  uy  and  p  and  a  comparison  with 
the  MC  results  is  shown.  In  addition,  we  compare  the  predicted  probability 
densities  of  ux,  uy  and  p  at  physical  positions  (0.429,  0.429)  and  (0.571,  0.571), 
with  the  PDFs  obtained  from  the  MC  estimate  using  105  observations,  as 
shown  in  Figs.  18  and  Fig.  19,  respectively.  From  the  figures,  we  can  see  that 
the  PDFs  do  not  have  symmetric  tails,  so  obviously,  they  are  not  Gaussian 
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Figure  11:  Stationary  random  field  -  Mean  of  ux:  (a)  MC  estimate  using  105  observations; 

(b) (d)(f)  The  predicted  mean  of  ux  using  100, 400,  and  1000  training  samples,  respectively; 

(c) (e)(g)  The  error  between  the  predicted  mean  and  the  MC  mean  for  N  =  100,400,  and 
1000,  respectively. 
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Figure  12:  Stationary  random  field  -  Mean  of  uy:  (a)  MC  estimate  using  105  observations; 

(b) (d)(f)  The  predicted  mean  of  uv  using  100, 400,  and  1000  training  samples,  respectively; 

(c) (e)(g)  The  error  between  the  predicted  mean  and  the  MC  mean  for  N  =  100,400,  and 
1000,  respectively. 
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Figure  13:  Stationary  random  field  -  Mean  of  p:  (a)  MC  estimate  using  105  observations; 

(b) (d)(f)  The  predicted  mean  of  p  using  100,400,  and  1000  training  samples,  respectively; 

(c) (e)(g)  The  error  between  the  predicted  mean  and  the  MC  mean  for  N  =  100,400,  and 
1000,  respectively. 
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Figure  14:  Stationary  random  field  -  Variance  of  ux:  (a)  MC  estimate  using  105  observa¬ 
tions;  (b)(d)(f)  The  predicted  variance  of  ux  using  100,400,  and  1000  training  samples, 
respectively;  (c)(e)(g)  The  error  between  the  predicted  variance  and  the  MC  variance  for 
N  =  100,400,  and  1000,  respectively. 
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Figure  15:  Stationary  random  field  -  Variance  of  uy :  (a)  MC  estimate  using  105  observa¬ 
tions;  (b)(d)(f)  The  predicted  variance  of  uv  using  100,400,  and  1000  training  samples, 
respectively;  (c)(e)(g)  The  error  between  the  predicted  variance  and  the  MC  variance  for 
N  =  100,400,  and  1000,  respectively. 
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Figure  16:  Stationary  random  field  -  Variance  of  p:  (a)  MC  estimate  using  105  obser¬ 
vations;  (b)(d)(f)  The  predicted  variance  of  p  using  100,400,  and  1000  training  samples, 
respectively;  (c)(e)(g)  The  error  between  the  predicted  variance  and  the  MC  variance  for 
N  =  100,400,  and  1000,  respectively. 
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Figure  17:  Stationary  random  field  -  The  L2  norm  of  the  error  as  a  function  of  the  number 
of  samples  observed  for  graphical  model  framework. 

distributions.  This  is  especially  true  for  the  velocity  components  that  should 
be  positive.  As  the  number  of  observations  increases,  we  can  observe  that 
the  graphical  model  prediction  gradually  captures  the  major  key  features  of 
the  PDFs. 

6.1.3.  Response  Prediction 

In  this  section,  we  will  show  that  the  constructed  graphical  model  is  also 
capable  of  acting  as  a  surrogate  model  of  the  deterministic  solver.  The  prob¬ 
lem  can  be  described  as  follows:  Given  a  new  observation  of  the  permeability 
field,  a,  the  objective  is  to  obtain  the  conditional  distribution  p(Y|A  =  a). 
With  a  new  realization  of  the  permeability  field  a,  we  first  compute  the 
localized  reduced  input  variables  S(a).  After  that,  instead  of  using  the  dis¬ 
tribution  of  S,  p( S),  we  use  a  Kronecker  Delta  function  <5s(S(a))  as  the  input 
message,  and  we  send  it  to  the  pre-learned  graphical  model  to  execute  the 
uonparametric  belief  propagation  algorithm.  Notice  that  now,  all  the  poten¬ 
tial  functions  involving  the  S  variable  need  to  be  multiplied  by  the  Kronecker 
Delta  function  hs(S(a)),  as  discussed  in  Section  5.2.1. 

Figs.  20,  21,  and  22  show  a  comparison  of  the  predicted  ux ,  uy  and  p  fields, 
respectively,  with  the  results  of  the  deterministic  solver  for  given  a  new  input 
permeability  field  a.  This  permeability  sample  was  generated  from  the  same 
process  as  the  training  data.  The  predictions  are  given  by  the  nonparametric 
model  using  different  number  of  training  data.  As  the  number  of  training 
data  ( N )  increases,  the  predictions  gradually  converge  to  the  true  response 
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(a) 


(b) 


(c) 


Figure  18:  Stationary  random  field  -  Comparison  of  the  predicted  PDFs  using  different 
training  data  with  the  MC  estimate  at  physical  position  ((0.429,0.429)):  (a)  ux,  (b)  uy, 
(c)  p. 
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(C) 


Figure  19:  Stationary  random  field  -  Comparison  of  the  predicted  PDFs  using  different 
training  data  with  the  MC  estimate  at  physical  position  (0.571,  0.571):  (a)  ux,  (b)  uy,  (c) 
V- 
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fields.  Note  that  in  the  learning  process,  the  unknown  parameters  in  the 
potential  functions  converge  after  some  N  and  using  more  data  does  not 
improve  considerably  the  final  predictions. 

6.2.  Non- stationary  random  field 

In  the  previous  example,  it  was  assumed  that  the  permeability  field  con¬ 
sidered  was  stationary  such  that  the  covariance  between  any  two  points  in  the 
domain  depends  on  their  distance  rather  than  their  actual  locations.  How¬ 
ever,  hydraulic  properties  may  exhibit  spatial  variations  at  various  scales. 
Therefore,  it  is  important  to  extend  the  probabilistic  graphical  model  to  non¬ 
stationary  random  fields.  In  this  example,  we  use  a  non-stationary  random 
field  as  stochastic  input.  The  log-permeability  on  the  k- th  coarse-element  is 
still  a  Gaussian  random  field  with  mean  zero  and  an  exponential  covariance 
function,  as  given  in  Eq.  (38): 


(42) 


However,  the  correlation  length  in  the  non-stationary  case  is  not  a  con¬ 
stant  anymore.  Since  the  coarse-grid  has  Nx  =  8  rows  and  Ny  —  8  columns 
of  elements,  we  define  the  coordinate  of  the  k- th  element  as  (ik,jk)  where  ik 
is  the  index  in  row  and  jk  is  the  index  in  column.  Then  the  correlation  length 
is  set  to  be  Ik, i  =  0.1  +  and  4,2  =  0.1  +  j^jik-  The  source  term  /  is 

set  to  zero.  Flow  is  induced  from  left  to  right  side  with  Dirichlet  boundary 
conditions  p  —  1  on  x  =  0,  p  =  0  on  y  =  1.  No— flow  Neumann  boundary 
conditions  are  applied  on  the  other  two  sides  of  the  square  domain. 

In  this  example,  we  investigate  the  non-stationary  problem  in  a  similar 
way  as  the  previous  stationary  case.  First,  we  verify  the  model  reduction 
framework  in  Section  6.2.1  and  then  we  investigate  the  uncertainty  propa¬ 
gation  from  the  inputs  to  the  responses  in  Section  6.2.2.  Finally,  we  use  the 
graphical  model  to  predict  the  responses  given  a  new  permeability  field,  in 
Section  6.2.3. 

6.2.1.  Model  reduction 

In  this  section,  we  first  compare  the  reconstructed  input  permeability 
field  with  the  original  one  for  different  k,  where  k  is  the  dimensionality  of 
the  reduced  space,  as  shown  in  Fig.  23.  Fig.  24  shows  the  comparison  of  the 
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Figure  20:  Stationary  random  field  -  Comparison  of  the  prediction  of  ux  given  a  new  input 
permeability  field  using  the  nonparametric  model  versus  the  true  ux  from  the  deterministic 
solver:  (a)  The  new  observed  input  permeability  field;  (b)  The  contour  plot  of  the  true 
ux  field;  (c)(f)(i)  The  predicted  mean  by  the  nonparametric  graphical  model  using  N  = 
400, 1000  and  4000  training  data,  respectively;  (d)(g)(j)  The  corresponding  predictive 
variance  for  N  =  400, 1000  and  4000  cases,  respectively;  (e)(h)(k)  The  error  between  the 
predicted  mean  and  the  true  response  field  for  N  =  400, 1000  and  4000  cases,  respectively. 
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Figure  21:  Stationary  random  field  -  Comparison  of  the  prediction  of  uy  given  a  new 
input  permeability  field  using  the  nonparametric  model  versus  the  true  uy  from  the  de¬ 
terministic  solver:  (a)  The  new  observed  input  permeability  field;  (b)  The  contour  plot 
of  the  true  uy  field;  (c)(f)(i)  The  predicted  mean  by  the  nonparametric  graphical  model 
using  N  =  400,1000  and  4000  training  data,  respectively;  (d)(g)(j)  The  corresponding 
predictive  variance  for  N  =  400,1000  and  4000  cases,  respectively;  (e)(h)(k)  The  differ¬ 
ence  between  the  predicted  mean  and  the  true  response  field  for  N  =  400, 1000  and  4000 
cases,  respectively. 
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Figure  22:  Stationary  random  field  -  Comparison  of  the  prediction  of  p  given  a  new  input 
permeability  field  using  the  nonparametric  model  versus  the  true  p  from  the  determin¬ 
istic  solver:  (a)  The  new  observed  input  permeability  field;  (b)  The  contour  plot  of  the 
true  p  field;  (c)(f)(i)  The  predicted  mean  by  the  nonparametric  graphical  model  using 
N  =  400,1000  and  4000  training  data,  respectively;  (d)(g)(j)  The  corresponding  predic¬ 
tive  variance  for  N  =  400,1000  and  4000  cases,  respectively;  (e)(h)(k)  The  difference 
between  the  predicted  mean  and  the  true  response  field  for  N  =  400, 1000  and  4000  cases, 
respectively. 
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reconstructed  input  permeability  field  with  the  original  given  sample  for  dif¬ 
ferent  number  of  training  samples  N.  In  comparison  with  the  reconstruction 
results  in  the  previous  example  in  Section  6.1.1,  a  higher  k  is  needed  here  to 
obtain  a  relatively  good  reconstruction.  This  is  expected  because  the  non¬ 
stationary  permeability  field  is  much  more  complicated  than  the  stationary 
case.  We  obtain  the  similar  conclusions  from  these  figures  as  in  the  earlier 
example.  As  the  reduced  dimensionality  k  increases,  the  reconstructed  per¬ 
meability  field  gets  closer  to  the  original  permeability.  Also  as  the  number 
of  training  data  N  used  increases,  the  reconstructed  permeability  becomes 
closer  to  the  original  realization. 


Figure  23:  Non-stationary  random  field  -  Comparison  of  the  reconstructed  input  perme¬ 
ability  field  with  the  original  given  sample:  (a)  With  different  k  for  N  =  2000;  (b)(c)(d) 
The  reconstructed  input  permeability  using  k  =  10,30,  and  50,  respectively. 
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Figure  24:  Non-stationary  random  field  -  Comparison  of  the  reconstructed  input  perme¬ 
ability  field  with  the  original  given  sample:  (a)  With  different  number  of  training  data  for 
k  =  30,  where  k  is  the  dimensionality  of  the  reduced  space;  (b)(d)(f)  The  reconstructed 
input  permeability  using  N  =  1000,  2000,  and  4000  training  data,  respectively. 


Finally,  we  compare  the  reconstruction  error  of  the  input  permeability 
field  with  different  number  of  training  data  N  and  different  reduced  dimen¬ 
sionality  k  using  Eq.  (40)  in  section  6.1.1,  as  shown  in  Fig.  25.  In  this 
example,  k  is  chosen  as  20,  that  is,  the  dimensionality  of  each  s  variable  is 
20. 
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Number  of  training  samples 

Figure  25:  Non-stationary  random  field  -  Comparison  of  the  reconstruction  error  of  the 
input  permeability  field  with  different  number  of  training  data,  and  different  reduced 
dimensionality  k. 

6.2.2.  Uncertainty  propagation 

In  this  section,  we  are  also  going  to  investigate  how  the  uncertainty  prop¬ 
agate  from  the  input  permeability  to  the  output  (velocity  and  pressure)  re¬ 
sponse,  as  in  Section  6.1.2.  Figure  26  compares  the  predicted  mean  of  ux 
with  a  Monte  Carlo  estimate  using  105  observations.  We  can  clearly  see  that 
as  the  number  of  training  data  increases,  the  prediction  gets  more  and  more 
accurate.  The  same  statistic  for  uy  and  p  is  reported  in  Figs.  27  and  28, 
respectively. 

Fig.  29  compares  the  predicted  variance  of  ux  to  a  Monte  Carlo  estimate 
using  105  observations.  Also,  the  predicted  variance  converges  to  the  MC 
results  with  the  increase  of  the  number  of  the  training  data.  The  same 
statistic  for  uy  and  p  is  given  in  Figs.  30  and  31,  respectively.  We  can 
see  that  in  this  example,  N  =  4000  training  samples  can  only  provide  a 
reasonable  predictions  for  the  marginal  mean  and  variance  of  the  responses, 
while  in  the  previous  stationary  example  in  section  6.1.2,  N  =  1000  training 
samples  can  already  give  rather  accurate  predictions.  The  predictions  of  the 
pressure,  compared  to  the  predictions  of  velocity,  are  much  more  accurate, 
this  is  because  the  pressure  has  less  variability  than  the  velocity  in  porous 
media  flow  problem. 
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Figure  26:  Non-stationary  random  field  -  Mean  of  ux:  (a)  MC  estimate  using  105  ob¬ 
servations;  (b)(d)(f)  The  predicted  mean  of  ux  using  400,1000,  and  4000  training  sam¬ 
ples,  respectively;  (c)(e)(g)  The  error  between  the  predicted  mean  and  the  MC  mean  for 
N  =  400, 1000,  and  4000,  respectively. 
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Figure  27:  Non-stationary  random  field  -  Mean  of  uy :  (a)  MC  estimate  using  105  ob¬ 
servations;  (b)(d)(f)  The  predicted  mean  of  ux  using  400,1000,  and  4000  training  sam¬ 
ples,  respectively;  (c)(e)(g)  The  error  between  the  predicted  mean  and  the  MC  mean  for 
N  =  400, 1000,  and  4000,  respectively. 
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Figure  28:  Mean  of  p:  (a)  MC  estimate  using  105  observations;  (b)(d)(f)  The  predicted 
mean  of  ux  using  400,1000,  and  4000  training  samples,  respectively;  (c)(e)(g)  The  error 
between  the  predicted  mean  and  the  MC  mean  for  N  =  400, 1000,  and  4000,  respectively. 
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Figure  29:  Non-stationary  random  field  -  Variance  of  ux :  (a)  MC  estimate  using  105 
observations;  (b)(d)(f)  The  predicted  variance  of  ux  using  400,1000,  and  4000  training 
samples,  respectively;  (c)(e)(g)  The  error  between  the  predicted  variance  and  the  MC 
variance  for  N  =  400, 1000,  and  4000,  respectively. 
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Figure  30:  Non-stationary  random  field  -  Variance  of  uy :  (a)  MC  estimate  using  105 
observations;  (b)(d)(f)  The  predicted  variance  of  ux  using  400,1000,  and  4000  training 
samples,  respectively;  (c)(e)(g)  The  error  between  the  predicted  variance  and  the  MC 
variance  for  N  =  400, 1000,  and  4000,  respectively. 
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Figure  31:  Non-stationary  random  field  -  Variance  oi  p:  (a)  MC  estimate  using  105  obser¬ 
vations;  (b)(d)(f)  The  predicted  variance  of  ux  using  400, 1000,  and  4000  training  samples, 
respectively;  (c)(e)(g)  The  error  between  the  predicted  variance  and  the  MC  variance  for 
N  =  400, 1000,  and  4000,  respectively. 
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Figure  32:  Non-stationary  random  field  -  The  L2  norm  of  the  error  as  a  function  of  the 
number  of  samples  observed  for  graphical  model  framework. 

Similarly,  in  Fig.  32,  we  plot  the  L2  norm  (Eq.  (41))  of  the  error  as  a 
function  of  the  number  of  samples  for  ux,  uy  and  p  and  compare  with  the 
MC  results.  In  addition,  we  compare  the  predicted  probability  densities  of 
ux,  uy  and  p  at  physical  positions  (0.429,0.429)  and  (0.571,0.571),  with  the 
PDFs  obtained  from  the  MC  estimate  using  105  observations,  as  shown  in 
Figs.  33  and  Fig.  34,  respectively.  From  the  figures,  we  can  see  that  as  the 
number  of  observations  increases,  the  graphical  model  prediction  gradually 
captures  the  major  key  features  of  the  PDFs. 

6.2.3.  Response  Prediction 

In  this  section,  we  also  provide  an  example  to  demonstrate  that  the  con¬ 
structed  graphical  model  is  capable  of  acting  as  a  surrogate  model  for  the 
deterministic  solver,  as  in  section  6.1.3.  Fig.  35  shows  a  comparison  of  the 
predicted  ux,  uy  and  p  fields,  with  the  results  of  the  deterministic  solver  for 
given  a  new  input  permeability  held  a,  using  N  =  4000  training  samples. 
Fig.  36  gives  the  comparison  results  for  another  realization.  As  shown  from 
the  figures,  the  predictions  capture  the  main  features  of  the  responses. 

7.  Discussion  and  Conclusions 

A  probabilistic  graphical  model  framework  was  developed  to  address  the 
uncertainty  propagation  problem  for  hows  in  porous  media.  The  framework 
could  quantify  the  uncertainties  propagating  from  the  random  input  to  the 
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(a)  (b) 


Figure  33:  Non-stationary  random  field  -  Comparison  of  the  predicted  PDFs  using  different 
training  data  with  the  MC  estimate  at  physical  position  ((0.429,0.429)):  (a)  ux,  (b)  uy, 
(c)  p. 
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(a)  (b) 


(c) 


Figure  34:  Non-stationary  random  field  -  Comparison  of  the  predicted  PDFs  using  different 
training  data  with  the  MC  estimate  at  physical  position  (0.571,  0.571):  (a)  ux,  (b)  uy,  (c) 
V- 
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Figure  35:  Non-stationary  random  field  -  Comparison  of  the  predicted  physical  responses 
given  a  realization  of  stochastic  input  permeability  with  the  true  response:  (a)  The  new 
observed  input  permeability  field;  (b)(e)(h)  The  true  responses  for  the  given  permeability 
realization,  from  top  to  bottom,  uX:  uy  and  p,  respectively;  (c)(f)(i)  The  predicted  means 
for  ux,  uy  and  p  by  graphical  model  using  N  =  4000  training  data,  respectively;  (d)(g)(j) 
The  difference  between  the  predicted  mean  and  the  true  response  field  for  ux,  uy  and  p , 
respectively. 
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Figure  36:  Non-stationary  random  field  -  Comparison  of  the  predicted  physical  responses 
given  a  realization  of  stochastic  input  permeability  with  the  true  response:  (a)  The  new 
observed  input  permeability  field;  (b)(e)(h)  The  true  responses  for  the  given  permeability 
realization,  from  top  to  bottom,  ux,  uy  and  p,  respectively;  (c)(f)(i)  The  predicted  means 
for  ux,  uy  and  p  by  graphical  model  using  N  =  4000  training  data,  respectively;  (d)(g)(j) 
The  difference  between  the  predicted  mean  and  the  true  response  field  for  ux,  uy  and  p, 
respectively. 
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multi-output  system  response.  The  high  dimensionality  nature  of  the  rela¬ 
tionship  between  the  inputs  and  responses  was  addressed  by  breaking  the 
global  problem  into  small  local  problems  posed  over  coarse-elements.  The 
whole  framework  was  designed  to  be  nonparametric  (Gaussian  mixture),  so 
it  was  capable  of  capturing  non-Gaussian  features  and  thus  it  should  have  a 
wider  applicability  to  other  multiscale  problems.  The  graphical  model  was 
shown  that  it  can  serve  as  a  surrogate  model  for  predicting  the  responses  for 
any  new  observed  permeability  input. 

Various  examples  were  considered  to  study  the  accuracy  and  efficiency 
of  the  probabilistic  graphical  model  framework  and  inference  algorithms.  It 
was  shown  that  this  framework  is  capable  of  predicting  the  correct  output 
statistics  with  rather  limited  number  of  observations.  In  the  provided  ex¬ 
amples,  it  was  shown  to  capture  well  the  first-  and  second-order  statistics, 
and  also  provided  reasonable  predictions  of  the  PDFs  of  the  outputs.  The 
framework  can  be  used  to  address  inverse  problems  (e.g.  from  limited  output 
data  predict  unobservable  permeability  information).  Such  inverse  problems 
and  extending  the  applicability  of  the  framework  to  other  critical  engineering 
applications  are  topics  of  current  research  interest. 

A.  PCA  model  reduction 

The  basic  algorithm  for  the  PCA  method  used  is  briefly  illustrated  as 
follows:  Consider  a  set  of  D  dimensional  data  X  :  =  {x(n);  n  =  1,  •  •  • ,  N} 
where  N  is  the  number  of  data.  We  first  compute  the  mean  as  x  =  E[x]  = 
T  ^2^=1  x*'n'>  and  the  covariance  matrix  as  C  =  -A  ^^(x^  —  x)(xhh  —  x)T. 
The  eigenvalues  A  *  and  eigenvectors  v*  of  the  covariance  matrix  C  are  then 
computed.  We  sort  the  eigenvalues  A,,  in  a  descending  order,  and  take  the  first 
corresponding  K  eigenvectors  to  assemble  the  transform  matrix  T.  Finally, 
we  map  the  original  data  set  to  a  A-dimensional  space  via  the  transform 
matrix  T  as 


£(n)  =  ft(xW)  =  TKxD(*{n)  -  x),  n  =  1, . . . ,  N.  (43) 

The  random  vector  £  :=  [G,  •  •  • ,  Gd  satisfies  the  following  two  conditions: 

E[G]=0,  E[GG]  =  Sijjj,  =  (44) 

Therefore,  the  random  coefficients  G  are  uncorrelated  but  not  independent. 
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(45) 


The  reconstruction  is  a  reverse  process  to  the  reduction  process: 


is  con¬ 


structed  using  only  the  first  K  eigenvectors,  and  we  use  X#  to  denote  the 
reconstructed  random  variable. 

B.  Expectation  Maximization  Algorithm 

EM  [42,  43]  is  an  elegant  and  powerful  method  to  find  maximum  likelihood 
solutions  for  mixture  models.  The  log-likelihood  function  is  given  by 


(46) 


The  log-likelihood  is  calculated  based  as  in  [44],  The  detailed  steps  are 
described  in  Algorithm  3.  Notice  that  in  our  implementation  of  the  EM 
algorithm,  the  number  of  mixture  components  M  was  kept  fixed. 

C.  Metropolis  Hastings  algorithm 

The  Metropolis  Hastings  (MH)  algorithm  can  draw  samples  from  any 
probability  distribution,  especially,  it  can  generate  samples  without  knowing 
the  normalization  constant  [25].  Therefore,  in  this  work,  we  use  MH  algo¬ 
rithm  to  generate  samples  from  — '"u  TJ|) —  in  Eq.  (29).  In  the  following,  for 
mathematical  convenience,  we  use  x  to  denote  the  random  variable  spm,  and 
P(x)  to  denote  the  distribution  — P'S(T)'>  v 

The  detailed  steps  are  given  in  Algorithm  4.  The  main  disadvantage  of 
the  MH  algorithm  is  that  the  samples  generated  are  correlated.  Even  though 
over  the  long  term  they  do  correctly  follow  P(x),  a  set  of  nearby  samples 
will  be  correlated  with  each  other  and  not  correctly  reflect  the  distribution. 
This  means  that  if  we  want  a  set  of  independent  samples,  we  have  to  throw 
away  the  majority  of  samples  and  only  take  every  n  —  th  sample,  for  some 
value  of  n  (in  this  work,  we  set  n  —  5).  In  addition,  although  the  Markov 
chain  eventually  converges  to  the  desired  distribution,  the  initial  samples 
may  follow  a  very  different  distribution,  especially  if  the  starting  point  is  in 
a  region  of  low  density.  So  a  “burn-in”  period  is  needed,  where  an  initial 
number  of  samples  (e.g.  the  first  1,  000)  are  thrown  away. 
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Algorithm  3  The  EM  Algorithm 

1:  Initialize  the  means  fi!rn\  covariance  and  mixing  coefficient  u/m\ 
and  evaluate  the  initial  value  of  the  log  likelihood. 

2:  E-step:  Evaluate  the  responsibilities  using  the  current  parameter  values 

3:  M-step:  Re-estimate  the  parameters  using  the  current  responsibilities: 


UM  — 

r^new 

l 

Am 

y(m)  _ 

l 

new 

Am 

u;(m)  = 

Am 

new 

A 

N 


(rim)zh 

i—  1 
N 


^  ^  7 (rim)  {%i 
i= 1 


r^new 


)0* 


r^new)  ’ 


where 


(48) 

(49) 

(50) 


N 

N m  ^  ^  • 

2—1 


(51) 


4:  Evaluate  the  log  likelihood  given  in  Eq.  (46)  using  the  current  parame¬ 
ters  and  check  for  convergence  of  the  log-likelihood.  If  the  convergence 
criterion  is  not  satisfied,  then  return  to  Step  2. 


61 


Algorithm  4  The  Metropolis  Hastings  Algorithm 
1:  Pick  an  arbitrary  probability  density  Q(x'\xt),  where  Q  is  the  proposal 
jumping  distribution,  which  suggests  a  new  sample  value  x'  given  a  sam¬ 
ple  value  Xt-  Here,  we  choose  a  widely  used  symmetric  jumping  distribu¬ 
tion  -  Gaussian  distribution  centered  at  xt. 

2:  Start  with  some  arbitrary  point  xo  as  the  first  sample. 

3:  To  generate  a  new  sample  xt+i  given  the  most  recent  sample  xt,  proceed 
as  follows: 

(1)  Generate  a  proposed  new  sample  value  x'  from  the  jumping  distri¬ 
bution  Q{x'\xt)- 

(2)  Calculate  the  acceptance  ratio  as: 

P(x') 

T  =  - . 

Pfa) 


(3)  If  r  >  1,  accept  x'  by  setting  xt+i  =  x'. 

(4)  Else,  accept  x'  with  probability  r.  That  is,  pick  a  uniformly  dis¬ 
tributed  random  number  u  ~  U[ 0, 1],  and  if  u  <  r  set  xt+i  =  x', 
else  set  xt+\  =  xt- 
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D.  Proof  of  estimated  variance  for  Gaussian  mixture 

Assume  y  is  a  random  variable  that  can  be  represented  as  y  — 
where  ym  ~  A/"(/im,  cr^J.  The  mean  of  y  can  be  calculated  as 


M 

E[j/]  =  ^  UmPm-  (53) 

m—  1 

And  the  variance  of  y  can  be  calculated  as 
Var[y]  =  E [y2]  -  (K[y])2 

M  M 
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(54) 


m=l 


E.  Gaussian  Mixture  Reduction 


Given  a  Ar-components  Gaussian  mixture,  we  want  to  find  an  effectively 
reduced  Gaussian  mixture  form  without  losing  too  much  information  from 
the  original  Gaussian  mixture.  The  problem  can  be  defined  as  follows: 

N  M 

f(x)  =  ^Wi  ■  =>  f(x)  =  •  A/'(x;/ij,Ei).  (55) 

i= 1  J=1 

General  approaches  dealing  with  the  problem  of  Gaussian  mixture  reduc¬ 
tion  can  be  classified  into  two  fields.  Bottom-up  approaches  start  with  a 
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single  Gaussian  function  and  iteratively  add  additional  components  until  the 
original  mixture  density  is  approximated  appropriately  (e.g.  PGMR  [45]). 
Top-down  approaches  take  the  original  Gaussian  mixture  density  and  it¬ 
eratively  decrease  the  number  of  mixture  components,  either  by  remov¬ 
ing  single  unimportant  components  or  by  merging  similar  components  (e.g. 
Salmond’s  algorithm  [46]).  In  addition,  these  algorithms  can  be  further  di¬ 
vided  into  local  and  global  methods.  Gaussian  mixture  reduction  via  clus¬ 
tering  (GMRC  [47])  method  can  be  classified  as  a  top-down  algorithm  using 
global  information.  The  interested  reader  can  refer  to  [47]  for  the  detailed 
algorithm. 
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